Analysing and Visualizing the Indian Movies dataset

The following code visualize the IMDB dataset of indian movies. The code is divided into four sections:

  1. Importing Dataset
  2. Analysing Dataset
  3. Cleaning Data
  4. Plotting Data

Importing all required libraries

Importing Data

Analysing Dataset

Cleaning Data

After looking at the Pandas Profiling Report we come to a following conclusion:

  1. There are multiple duplicate values in the dataset which needs to be removed.
  2. There are many missing values in the dataset and if a movie does not have the required rating or year or director name then it will hard to the visualise the data. Hence, such rows needs to be eliminated.
  3. The year field has brackets which are not required and the movie duration field has 'min' at the end. This has to be removed so we have datatype 'int' and not ''string'.
  4. There are multiple entries in Genre column seperate by a ' , ' . Spliting of data into multiple columns is required for such data.

Plotting Data

Plotting various graphs to visualize the data throughout the years.

Number of movies by year of launch

Genres through the Years

Top 20 actors by the number of movies made

Top 20 actors by number of movies made through the year